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ABSTRACT 

This  paper  investigates  the  use  of  individual  cross  section  data  to 
describe  macro  functions.   Necessary  and  sufficient  conditions  (denoted 
AS)  are  found  for  OLS  slope  coefficients  from  a  cross  section  to  con- 
sistently estimate  the  first  derivatives  of  the  macro  function.   AS  em- 
bodies both  sets  of  aggregation  assumptions  known;   linear  aggregation 
and  sufficient  statistics ,  and  thus  represents  generalized  aggregation 
conditions.   A  methodology  is  given  for  estimating  second  order  deriva- 
tives of  the  macro  function  from  cross  section  data  for  distributions 
of  the  exponential  family,  which  extends  to  higher  order  derivatives. 
Finally,  a  general  test  of  linear  aggregation  schemes  is  described. 
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statistical  Aggregation  Analysis: 
Characterizing  Macro  Functions  with  Cross  Section  Data* 


1.   INTRODUCTION 

It  is  common  practice  in  the  study  of  macroeconomic  relations  to  derive 
a  model  among  the  relevant  variables  based  on  individual  behavior,  and  then 
estimate  the  model's  parameters  using  time  series  data  on  the  averages  of  those 
variables  across  the  population.   Such  estimated  relations  are  justified  as 
describing  the  behavior  of  an  individual  with  "representative"  values  of  the 
predictor  variables. 

In  general,  the  true  macro  relation  between  averaged  data  results  from 
the  process  of  integrating  (averaging)  the  true  individual  behavioral  function 
over  the  distribution  of  its  arguments  in  the  population.   Even  in  the  simplest 
consumption  function  regression  of  average  consumption  on  average  income,  one 
is  only  capturing  the  statistical  relation  between  two  s\immary  statistics  of 
the  underlying  consumption-income  distribution.   Unless  saving  behavior  is 
virtually  identical  across  individuals  or  the  structure  of  the  income  distribution 
can  be  simply  represented,  an  average  consumption-average  income  regression  will  not 
adequately  describe  the  structure  relating  average  consumption  to  the  income  dis- 
tribution.   In  this  sense,  any  macro  function  in  the  form  of  an  individual  be- 

2 
havioral  relation  is  likely  to  ignore  important  distributional  influences. 

A  consistent  model  of  such  a  macro  relation  thus  requires  both  the  speci- 
fication of  the  individual  behavioral  function  and  the  population  distribution 
of  its  arguments.   Only  if  the  analyst  resorts  to  restrictive  assumptions  pro- 
vided by  aggregation  theory,  such  as  linearity  in  the  individual  behavioral  func- 
tion, can  the  requirement  of  fully  specifying  the  behavioral  function  and  dis- 
tribution forms  be  relaxed.   In  addition,  just  stating  such  restrictions  and 


ni^5HD 


4  - 


proceeding  to  estimate  with  average  data  only  provides  a  weak  basis  for  the 
macro  relation  form,  as  any  violation  of  the  underlying  restrictions  will  alter 
it.   For  instance,  if  the  true  individual  function  is  nonlinear,  then  in  general 
the  macro  relation  between  averages  will  differ  in  form  from  the  individual  func- 
tion, with  the  true  macro  relation  form  heavily  dependent  on  the  actual  distri- 
bution of  individual  variables. 

Often  there  are  available  cross  section  data  -  individual  data  on  the  com- 
ponents of  the  averages  -  for  one  or  more  time  periods  of  the  study.   If  these 
data  represent  a  random  sampling  of  the  population,  then  in  principle  both  the 
micro  behavioral  relation  and  the  underlying  population  distribution  can  be 
empirically  characterized.   However,  this  process  is  likely  to  be  imprecise, 
leaving  large  portions  of  the  observed  data  configuration  unexplained. 

The  initial  purpose  of  this  paper  is  to  discover  when  simple  statistical 
analysis  applied  to  cross  section  data  -  namely  ordinary  least  squares  (OLS) 
regression  analysis  -  can  reveal  partial  information  about  the  true  macro 
relation  without  recourse  to  specific  micro  functional  form  or  distribution 
form  assumptions.   We  find  that  the  slope  coefficients  from  an  OLS  regression 
on  cross  section  data  will  consistently  estimate  the  first  derivatives  of 
the  true  macro  function  if  and  only  if  a  certain  property  holds,  called  asymp- 
totic sufficiency  (AS)  of  the  average  predictor  variables  for  the  average  de- 
pendent variable.   This  is  shown  in  Section  3,  after  the  notation  and  basic 
assumptions  are  given  in  Section  2. 

Because  of  the   importance  of   aggregation  theory  in  the  consistent 
formation  of  a  macro  function,  we  next  investigate  the  relation  of  AS  to  the 

two  major  blocks  of  aggregation  assumptions  appearing  in  the  literature;  the 

4 
linear  (exact  and  consistent) aggregation  approaches  of  economics   and  the 

theory  of  sufficiency  in  the  statistical  literature.   These  approaches  are 
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reviewed  in  conjunction  with  some  illustrative  examples  in  Section  4.   We  find 
that  the  AS  property  contains  both  types  of  aggregation  assumptions  as  special 
cases,  and  thus  AS  represents  a  generalized  aggregation  condition.   Next  a 
characterization  theorem  for  AS  is  proven  which  shows  linear  aggregation  (which 
uses  only  functional  form  assumptions)  and  sufficient  statistics  (which  uses 
only  distribution  assumptions)  as  polar  cases  \inder  which  AS  holds,  with  inter- 
mediate cases  showing  the  trade-offs  required  between  making  distribution  and 
functional  form  assumptions  under  AS . 

When  the  average  predictor  variables  are  sufficient  statistics   for  the 
parameters  of  the  underlying  distribution,  the  true  macro  function  can  be  non- 
linear in  form.   In  Section  5  we  present  a  methodology  for  estimating  all  higher 
order  derivatives  of  the  true  macro  relationship  from  cross  section  moments, 
when  the  distribution  is  a  member  of  the  exponential  family.   We  present  ex- 
plicitly the  formulae  for  second  order  derivatives.   Finally,  these  formulae 
give  rise  to  a  general  test  of  linear  aggregation  approaches,  relying  only  on 
the  existence  of  certain  population  moments. 

In  short,  this  paper  investigates  the  use  of  simple  statistical  techniques 
as  applied  to  cross  section  data  to  characterize  the  true  macro  relation,  termed 
"statistical  aggregation  analysis"  in  the  title.   These  techniques  provide 
information  about  the  macro  function  based  on  relatively  weak  assumptions,  which 
can  either  be  used  to  judge  specific  modeling  assumptions  or  pooled  with  averaged 
data  in  a  joint  estimation  process. 
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2,   PRELIMINARIES 

For  a  complete  discussion  of  the  issues  addressed  in  this  paper,  a  very 
general  specification  of  the  population  structure  underlying  a  macro  relation 
is  required.   However,  in  order  to  direct  attention  to  the  distributional  in- 
fluences on  macro  relations,  which  provide  the  focus  of  results  in  Section  3.1, 
we  present  slightly  simplified  background  assumptions  and  notation.   In  Section 
3.2  these  assimiptions  are  relaxed  and  the  results  reinterpreted. 

We  begin  by  assuming  that  there  is  a  large  population  of  individuals  in  T 
time  periods  with  periods  indexed  by  t=l,...,T.   There  are  N  individuals  in 

period  t,  indexed  by  n=l,...,N  .   For  each  agent  n  in  period  t,  there  is  a  vec- 

t        .        t 
tor  of  personal  attributes  A  .   For  given  t,  A  is  assumed  to  capture  all 

n  n 

differences  in  individual  agents,  whether  observable  or  not.   Also,  for  each 

agent  n  in  period  t  there  is  a  dependent  quantity  x  ,  which  is  determined  by  A 

n  n 

via 

x^  =  f  (A*^)  (2.1) 

n      n 

f,  the  individual  behavioral  relation,  is  assiomed  here  to  not  vary  with  t,  an 

unnecessary  restriction  which  is  relaxed  in  Section  3.2, 

Now  for  each  t  the  set  {a  |n=l,...,N  }  may  be  considered  as  a  random  sample 

n 

from  a  distribution  with  density  pCAJO  ).     6   =  (9-,  ,  .  . .  ,6  )^is  an  L-vector  of 
parameters  which  acco\ant  for  all  changes  in  the  underlying  distribution  p(a|9  ) 
over  time  t.   We  denote  the  parameter  space  of  0   as  F,  where  T   =  {6eR  |p(A|6)  is 
a  density},  where  R  is  L-dimensional  Euclidean  space. 

For  each  period  t,  the  following  average  statistics  are  observed 

N^  N^ 

^             T      n  .     ^T    m  n 

~t   n=l  — t   n=l               ,     ,,      (9   2) 

X   =  ;      V   = ;      m=l,...,M       \^.^) 

t  m      „t 

N  N 
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t  t 

where  v  (A  ) ,  m=l,...,M  are  observable  functions  of  A  .   The  vector 
m  n  n 

(v  (A  ),..., V  (A  ))'  is  denoted  as  v(A  )  and  the  vector (V, ,... ,V„) '  as  V  . 
1  n       m  n  n  1      M 

— t     — t 
Our  primary  interest  here  is  in  the  relation  between  x  and  V  ,  referred  to  as 

the  macro  relation,  which  arises  from  the  micro  functional  form  f  and  the 

distributional  form  p.   We  now  proceed  to  characterize  this  relation. 

We  first  make  an  assumption  concerning  the  population  structure. 


t        t 
ASSUMPTION  Al:  All  first  and  second  order  moments  of  x  and  v(A  )  exist  given 

n        n 

t,  and  the  covariance  matrix  of  v(A  )  is  nonsingular. 

n 


As  notation,  denote 

E(xle^)  =  /f(A)p(Ale^)8A  =  (j)(6^) 

E(v(A)l9'^)  =  gO'^)  =  M^ 

I  V 

E{(x  -  (j)(e^))^Ie^)=  a^  (2.3) 

XX 

E((x  -  (j)(e^))  (v(A)  -  y^)  |e^)  =  E^ 

V  XV 


E{(v(A)  -  y^)(v(A)  -  y^)'Ie^)=  E^ 

V  V  w 

In  (2.3)  the  means  of  x  and  v(A  )  given  t  are  written  as  functions  of 

n        n 

t  — t     — t 

9  .   In  order  to  ascertain  the  large  sample  relationship  between  x  and  V  ,  we 

reparameterize  E(xl9  )  =  (})  (9  )  in  terms  of  y  .   For  this  we  require 

ASSUMPTION  A2:   L  =  M,  and  M   =  g(6  )  is  invertible  in  6  .   Moreover,  the  range 
of  g,  i.e.  {g(9)  |eer}   contains  an  open  convex  set  $Cr  ,  with  the  realized  values 

II  ^        1        T       T 

^„  =  g(9-^),...,y   =  g(9  )  interior  points  of  $. 
v  V 
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Assxamption  A2  is  mainly  made  for  convenience,  and  is  relaxed  somewhat  in 

Section  3.2. 

Performing  this  inversion,  we  can  reparameterize  p(aI6  )  as  p*(A|y  )  = 

p(A  g   (y  )),  so  that  mean  x  in  period  t  appears  as 
V  n 

E(xle'^)  =  (})(g"-'-(yS)  =  (J)*(yS  (2.4) 

*  — t     —t. 

<i>      represents  the  correct  large  sample  relationship  between  x  and  V  , 

7 
because  by  the  Weak  Law  of  Large  Numbers; 

— t  *      t  —t.  t 

plim  X     =  (j)    (y    )  ;      plim  V     =  y  (2.5) 

rvjt-Kio  "        Nt-K» 

so  that   if  N      is   large, 

^   =r  ({)*(V^)  (2.6) 

— t     — t 
represents  the  correct  macro  relation  between  x  and  V  over  all  time  periods. 

Our  final  background  assumption  is 


* 

ASSUMPTION  A3:     V  (f)   exists  for  all  y  e$  where  V  denotes  the  gradient 

V 

operator. 


In  addition  to  the  macro  data  (2.2),  we  also  observe  a  random  sample 

of  K  agents  in  a  particular  period  t°;  a  cross  section  data  base.   We  index 

t„    ,  t. 


members  of  this  sample  by  k=l,...,K,  and  therefore  have  as  data  x  °,  v(A^°), 

Q  4- 

k=l,...,K.    We  assume  that  K  is  smaller  than  N  °,  but  still  large  enough 

9 
to  employ  large  sample  statistical  results.    In  this  paper  our  main  concern 

* 

is  what  can  be  learned  from  this  sample  about  (j)  ,  the  macro  function.   In 

particular,  in  the  next  section  we  establish  necessary  and  sufficient  conditions 

t        t 
for  the  slope  coefficients  b  obtained  from  regressing  x  °  on  v{J\^°),    k=l,...,K 


I 


*       * 
(and  a  constant)  to  consistently  estimate  the  derivatives  of  (j)  ;  V  ({)  .   By 

standard  methods ,  we  have  that 

t  -1   t 
plim  b^  =  (E  °)    Z  °  (2.7) 

K-.O0     K       W       XV 

This  concludes  the  presentation  of  the  basic  framework  and  notation. 

As  stated  in  the  introduction,  if  f  and  p  are  known,  then  an  integration 

* 
process  (in  principle)  yields  the  correct  macro  relation  (f)  ,  whose  parameters 

could  then  be  estimated  using  average  data  observed  over  time  t.   However, 

in  general,  f  and  p  will  not  be  known  with  certainty.   Even  if  a  form  f  is 

* 
suggested  by  economic  theory,  (p     will  depend  in  form  on  the  choice  of  p,  unless 

f  satisfies  consistent  (linear)  aggregation  restrictions.       In  any 

t 
realistic  model  indicating  differences  between  individuals,  A   and/or 

n 

v(A  )  will  be  a  large  vector,   usually  making  certainty  about  the  form 
n 

of  f  or  p  unwarranted.     Recall  p(a|9  )  is  the  joint  distribution  of  all 

relevant  individual  attributes.   To  reiterate,  the  overall  aim  of  this  paper 

is  to  study  the  conditions  under  which  a  cross  section  data  base,  as  a  reflection 

of  both  f  and  p  structures,  can  through  simple  statistical  techniques  provide 

* 
information  about  (p    . 

Our  general  notation  provides  for  a  distinction  between  the  underlying 

behavioral  attributes  A  and  the  observable  variables  v(A  ).   If  x  depends 

n  n        n 

t  *  t         t 

directly  on  v(A  )  through  f;  i.e.  there  exists  f   such  that  f  (A  )  =f*(v(A  )), 
n  n         n 

then  no  such  distinction  is  required.   However,  included  in  our  general  analysis 

— t  — t 

are  situations  where  V  is  the  relevant  predictor  of  x   through  assumptions  on 

12  t 

p  only.     In  general  A  in  used  to  represent  all  dimensions  on  which  individuals 

n 

differ,  and  therefore  linear  models  with  random  coefficients,  standard  dis- 
turbance terms,  etc.,  can  all  be  embodied  in  this  framework,  in  addition  to  the 

predictor  variables  v(A  ). 

n 
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For  concreteness ,  suppose  that  consxamer  demand  is  studied,  x  can  represent 

n 

the  demand  for  a  particular  commodity  by  family  n  in  year  t,  v  (A  )  family 

1   n 

income,  v  (A  )  family  size  and  v  (A  )  a  qualitative  variable  indicating  whe- 
2   n  3   n 

ther  the  family  has  a  rural  residence,   x   is  average  quantity  demanded  in  year 

— t  — t  — t 

t,  V  average  income,  V  average  family  size  and  V  the  percentage  of  families 

with  rural  residences .     Our  framework  covers  both  aggregation  schemes  where 

t  t  — t 

X  is  f\inctionally  related  to  v(A  )  or  where  V  describes  movements  in  the 
n  n 

— t 
underlying  distribution  sufficiently  to  determine  x  movements  over  time. 


3.   MICRO  REGRESSIONS  AND  MACRO  FUNCTIONS 

_3.1  The  Basic  Results 

In  this  section  we  characterize  the  conditions  under  which  the  micro 

slope  coefficients  b  of  (2.1)    will  consistently  estimate  the  first  derivatives 

K 

* 
of  the  macro  function  (f)   with  respect  to  y  .   For  the  majority  of  this  section 

we  consider  only  the  time  period  t^,  and  so  the  time  superscripts  are  omitted. 

Of  central  importance  to  this  inquiry  is  the  conditional  expectation  of 

x  given  V,  denoted  x 

X  =  E(x|v)  (3.1) 

In  general,  x  is  a  function  of  2M+1  arguments,  V,  y   and  N,  so  that  we  write 

x  =  x(V,y^,N)  (3.2) 

X  is  required  to  obey  some  regularity  conditions,  as  summarized  in 


—        ~14 
ASSUMPTION  A4:   x  exists  and  is  continuous  and  dif ferentiable  in  V,  and  V_x 

approaches  a  finite  limit  G (y  )  7^  0  as  N  approaches  infinity  and  V  approaches 

y  . 

V 


-  11- 


We  can  obtain  the  following  result  concerning  the  large  sample  be- 
havior of  X,  V,  and  x 

Lemma  3.1: 

a)   Under  Assvimptions Al  and  A2,  we  have  that  as  N  increases 

_    *  _ 

plim  X  =  *  (y  ) ;    plim  V  =  y 

V  V 

and  that  the  asymptotic  distribution  of 

* 

X  -  4)  (y  ) 


v-y^ 


is  multivariate  normal  with  mean  zero  and  variance  covariance  matrix 


XX      XV 

XV     w 


b)   Under  Assumptions  Al,  A2  and  A4,  as  N-*<» 


v^(x(v,y  N)  -  *  (y  ) ) 

V  V 


converges  in  distribution  to 


N  (V  -  y  )  G(y  ) 

V       V 

Proof:   Part  a)  is  a  standard  application  of  the  Weak  Law  of  Large  Numbers  and 
the  Central  Limit  Theorem.   Part  b)  is  shown  in  the  Appendix. 

QED. 


We  are  now  in  a  position  to  show  the  first  Important  result: 


-  12 


Theorem  3.2:   Consider  the  micro  slope  coefficients  b  obtained  by  re- 
gressing  x  on  v(A  )  (and  a  constant)  in  a  cross  section  random  sample. 
Under  Assumptions  Al,  A2,  and  A4,  we  have  plim  b  =  G(iJ  ). 

Proof:   Multiply  >^J~(x  -  (f)  (y  ) )  by  v^  (V  -  y  )  and  take  the  expectation,  giving 

E(N  (x  -  4)*(y  ))  {V  -  y  ))  =  Z 

V  V        XV 

which  expands  as 

Z   =  E  (N  (x  -  4)  (y  ) )  (V  -  y  ) )  +  E  (N  (x  -  X)  (V  -  y  ) ) 

XV  V  V  V 

=  E(N(x  -  c')  (y  ))  (V  -  y  )) 


where  the  second  term  vanishes  by  first  conditioning  on  V  and  then  taking 
the  overall  expectation.   We  also  clearly  have  that 


E(N(V  -  y  )  (V  -  y  )')  =  E 

V         V         w 


Applying  Lemma  3,1  b) ,  we  obtain  the  equality 


lim  E(N(5  -  (J)  (y  ))(V  -  y  )) 

V  V 


lim  E(N(v  -  y  )  (V  -  y  )  )G(y  ) 

V  V        V 


or,  in  view  of  the  above  developments 


E   =  E  G(y  ) 

XV      W     V 

which,  from  (2.7)  and  the  assiomption  that  E   is  nonsingular,  gives 


plim  b  =  G(y  ) 
K       V 


QED. 
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From  applying  results  of  the  Central  Limit  Theorem,  we  have  just  shown 
that  OLS  slope  coefficients  from  a  randomly  sampled  cross  section  will  con- 
sistently estimate  the  large  sample  derivatives  of  the  "average"  regression 
function  x  =  E(x|v)  with  respect  to  V.     This  is  a  very  general  result, 

relying  only  on  the  regularity  properties  of  Assumptions  Al,  A2  and  A4,  which 

17 
concern  x  and  the  population  distribution  p. 

* 
In  order  to  relate  this  result  to  the  derivatives  of  ({)  ,  we  begin  by  noting 

the  pointwise  convergence  of  the  function  x  to  (})   implicit  in  Lemma  3.1  b)  : 
lim  x(y  ,y  ,N)  =  (1)  (y  )  (3.3) 

„         V   V  V 

where  the  argument  V  has  been  set  to  y  .   Theorem  3.2  relates  the  regression 

V 

coefficients  to  the  large  sample  derivatives  of  x  with  regard  to  the  first 
argument  only.   Because  of  this  we  must  be  very  specific  about  the  role  of  V, 
the  first  argument  in  x.   To  this  end  we  introduce  an  M  vector  of  dimimy  argxmients 
it   and  rewrite  x  as 

X   =  x{^,    y^,N)I^^-  (3.4) 

This  allows  us  to  discriminate  changes  in  the  first  argument  ^  as  N-*^'  from 
changes  in  the  second  argument  y  ,  avoiding  the  problem  of  V  approaching  y 
in  probability  as  N^<». 

Using  this  notation,  we  also  have 

V-x  =V^x(lf-,y^,N)|^^-  (3.5) 

and  pointwise  convergence  as  in 

lim  V  i(y  y  ,N)  =  G(y  )  (3_g) 

N->oo   ^ 

Now,  in  order  to  remove  some  pathological  cases  from  the  analysis,  we  adopt 
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the  following  assumption  on  V  x,  the  gradient  vector  with  regard  to  the  first 
set  of  arguments  ^,    and  V   x,  the  gradient  vector  with  regard  to  the  second 

V 

set  of  arguments y  - 


18  ** 

ASSUMPTION  A5:   V  x  converges  uniformly   to  a  vector  function  G   {i|^,y  )  as 

N-x».   Also,  V  X  exists  and  converges  uniformly  to  a  vector  fiinction  H{i/;,y  ) 
V  ^ 

as  N-x». 


~  ** 

A5  implies  that  x  converges  to  a  function  cf)  (ip  ,\}    )    as  N-^°°.   From  (3.6)  and 

(3.3)    we  have   that 

icic  ic 

<t>       {\i    ,V.    )    =  <i>    {M    )  (3.7) 

V  V  V 
** 

G      (y    ,y    )   =  G(y   ) 

V  V  V 

19 
and  by  the  uniform  convergence  assumption 

V'^'^'V  =^**('^'V  (3.8) 

** 
so  v.*   (y  ,y  )  =  G(y  ) 

\p  V   V        v 

and 

V  (J)**(;|;,y^)  =  H(i|;,y^)  (3.9) 

V 

* 
We  can  now  decompose  the  gradient  of  the  macro  function  (f)  with  respect  to  y 

(via  (3.7))  as 

^V  V 

(3.10) 
In  view  of  this  discussion,  we  have  shown 

Theorem  3.3:   Under  Assumptions  Al,  A2,  A3,  A4  and  A5 
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^  * 

plim  b^  =  V  A  (y  ) 
K    u     V 

V 


if  and  only  if  h(u  ^u  )  =  0 

V   V 


Thus,  at  a  given  point  in  time,  the  micro  regression  coefficients  b 

K 

* 

will  consistently  estimate  the  first  derivatives  of  the  macro  function  4'  (y  ) 

**  **  ** 

if  and  only  if  V  ({)   vanishes,  where  V  (j)    is  the  gradient  of  <^        with 

V  V 

regard  to  its  second  set  of  arguments.   For  such  slope  coefficients  to  always 

*      '  ** 

consistently  estimate  the  first  derivatives  of  (})  ,  we  must  require  that  V  (\)        van- 

**  ^v 

ish  at  all  parameter  points,  i.e.  that  (J)   can  be  written  without  reference  to  its 

*** 
second  arg\iment  y  .   Thus  there  exists  a  f lanction  (j)    of  M  arguments,  such  that 

(})**(ijj,y  )  =  (f)***(ijJ)  (3.11) 


In  view  of  (3.7), 

4>***(y^)  =  4'*(y^)  (3.12) 

***       * 
or  that  ({)    and  ^      are  the  same  function. 

This  condition  is  important  enough  to  merit  a  name  (where  we  return  to  using 
t  superscripts) . 

— t  — t 

Definition  1:   V  is  asymptotically  sufficient  for  determining  x  if  for  all 

^e$  and  y  e$ 

V 

lim  5(i|;,y^,N^)  =  (f)*(i|;)  (3.13) 

This  property  is  abbreviated  as  AS  in  the  rest  of  the  exposition. 
We  can  summarize  the  following  discussion  as 
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Theorem  3.4:   Assume  Al,  A2,  A3,  A4  and  A5.   Let  y  e$,  so  that  p  (A m  )  is 

V  '   V 

the  population  distribution  in  period  t.   OLS  slope  coefficients  from  a  random 

*  t 

sample  cross  section  in  period  t  consistently  estimate  V  (})   evaluated  at  y 

y  V 

t  -t        ^ 

for  all  y  £$,  if  and  only  if  AS  holds,  i.e.  V   is  asymptotically  sufficient 
for  determining  x  . 


In  short,  AS  holds  if  x,  viewed  as  a  function  of  V  ,  has  the  same  func- 

* 
tional  form  in  a  large  sample  as  the  macro  relation  (J)  ,  viewed  as  a  function  of 

t 

y  .   This  condition  represents  a  relatively  strong  restriction  on  the  forms  of 

f  and/or  p.   However,  as  indicated  in  Section  4,  AS  embodies  virtually  all  types 

20 
of  aggregation  assumptions  from  the  economics  and  statistics  literatures. 

Therefore,  AS  can  be  viewed  as  a  generalized  aggregation  condition. 
A  small  sample  counterpart  to  AS  can  be  defined  as 

1.  — t 

Definition  2:   V  is  sufficient  for  determining  x   if  there  exists  a  function 

~*  — t      t 

X  of  the  M  +  1  arguments  V  and  N   such  that 

~  ,— t   t   t,  -*  ,—t      t, 

x(v  ,y  ,N  )  =  X  (V  ,N  ) 

V 

~  — t     t 
The  small  sample  definition  requires  that  x(V  ,y  ,N  )  can  be  written  with- 
out reference  to  y  ,  for  all  N  .   AS  requires  this  property  to  hold  in  the 

t  — t  — t 

limit  as  N  -x».   Clearly  if  V  is  sufficient  for  determining  x  ,  then  AS  holds, 

as  well  as  the  conclusion  of  Theorem  3.4. 

The  conditions  of  Definition  2  have  appeared  previously  in  the  statistics 

— t  .  . 

literature  in  a  slightly  different   context.   V  sufficient  for  determining 

— t 

X  represents  the  precise  condition  under  which  the  well-known  Rao-Blackwell 

21  ~  22 

Theorem  holds,    which  states  that  x  is  the  best   unbiased  estimator  of 

— t     *   t  — t 

E(x  )  =  <i>    (y  )  based  on  V  .   AS  guarantees  that  x  will  converge  to  such  a  best 

V 

t 
estimator  as  N  ^«'. 


I 
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We  now  turn  to  two  extensions  of  the  basic  notation,  with  the  reinter- 
pretation  of  Theorem  3.4  under  each. 

3.2   Extensions 

There  are  two  extensions  of  the  basic  framework  which  are  of  interest  to 
empirical  uses  of  the  AS  property.   The  first  is  to  allow  for  the  behavioral 
function  f  to  vary  over  time.   The  second  is  to  allow  for  more  distributional 
parameters  than  average  statistics  (L>M) .   These  extensions  are  discussed  for- 
mally below  and  illustrated  in  the  examples  of  Section  4. 

Suppose  first  that  the  behavioral  function  f  varies  over  time,  as  indicated 
by  a  vector  of  parameters  y    .      Thus,  f  is  rewritten  as 

x^  =  f (A^,Y^)  (3.14) 

n      n 

extending  the  previous  notation  to  include  y  ,   From  the  development  of  Section 

t    — t 
3.1,  we  see  that  all  functions  deriving  from  expectations of  x  or  x  will  now 

^         *     *      **    **   ***     _  * 
depend  on  Y  (i.e.  (}),4)  ,V  c|)  ,x,G,G   ,H,(|)  ,<i>  and  x  ).   In  particular,  the 

^"^  t      t        t 

macro  function  (j)   now  depends  on  both  y   and  y  /  with  y   representing  distri- 
bution parameters  and  y  representing  behavioral  parameters.   The  defining  con- 
dition (3.13)  of  AS  is  replaced  by 

lim  X  (i|^,y ^,N^,y^)  =  cj)  {i),y    )  for  all  i> ,]i^e^  (3.15) 

t 
where  each  list  of  arguments  is  extended  to  reflect  dependence  on  y  . 

Under  this  additional  consideration  all  of  the  results  given  above  hold, 
with  the  proviso  that  the  y  argument  in  all  functions  is  held  constant  at 

y  =  y  °,  the  behavioral  parameters  for  the  period  t„  of  the  cross  section. 

— t     — t 
Theorem  3.4  is  now  stated  as:   given  asymptotic  sufficiency  of  V   for  x   (using 

condition  (3.15)),  the  slope  coefficients  from  a  cross  section  random  sample  at 
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*  t       t 

time  tg  will  consistently  estimate  V  <t)    ,    evaluated  at  both  y  °  and  Y  °- 

V 

This  extension  is  of  interest  to  actual  empirical  uses  of  these  results  be- 
cause there  are  often  variables  common  to  all  micro  agents  which  modify  their 

* 
behavior  (e.g.  common  prices,  general  economic  conditions,  etc.).   4*   must 

be  modeled  with  regard  to  both  distributional  influences  (y  )  and  common  para- 
meter influences  (y  ) .   Here  OLS  slope  estimates  from  a  cross  section  can  be 
used  to  estimate  the  derivatives  of  ^     with  regard  to  distributional  variables 

in  a  given  time  period,  and  thus  can  be  used  either  to  judge  restrictive  assump- 

* 

tions  on  the  form  of  0   or  pooled  with  average  time  series  data  for  more  pre- 

* 
cise  estimation  of  4)  .   In  this  way,  if  cross  section  random  samples  are  avail- 
able for  several  time  periods,  slope  estimates  from  each  data  base  can  be  used 

* 
to  indicate  structural  changes  in  <^    ,    and  thus  guide  the  choice  of  a  model  con- 
sistent with  all  available  evidence.   Similarly,  multiple  sets  of  estimates  can  be' 
pooled  in  the  estimation  of  such  a  model.   In  addition,  this  extension  is  im- 
portant in  consideration  of  exact  aggregation  models,  which  are  reviewed  in 
Section  4. 

For  the  second  extension,  assume  that  f  is  not  changing  over  time  (f  is 

given  in  (2.1)),  but  that  9  is  an  L  vector,  L>M,  where  M  is  the  number  of 

average  statistics  V  ,m=l,...,M.   The  inversion  \i   -<->6  (Assiimption  A2)  is  now 

m  V 

performed  with  regard  to  M  elements  of  9  ,  conditional  on  the  value  of  the  re- 

t  "^3 
maining  L  -  M  parameters,  denoted  6^.     This  implies  that  all  functions  de- 
riving from  expectations  using  the  y  ,^ o   parameterization  will  depend  ex- 

t         *   *  ~      **      **     ~* 
plicitly  on  9^   (i.e.  p  ,  ({)  ,  x,  G,  G   ,  H,  (})    and  x  ). 

In  the  same  fashion  as  the  first  extension,  all  the  results  of  Section  3.1 
hold,  with  the  proviso  that  9^  is  held  constant.   In  particular,  if  the  de- 
fining equation  (3.13)  of  AS  is  replaced  by 
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lim  x{\l),\J    ,N  ,e^)  =  <t>    {\p,    6^)  for  all  ij;,y^e$ 
Nt^      V      °  V 

where  6^  has  been  appended  to  the  lists  of  arguments,  then  Theorem  3.4  states 

that  OLS  slope  coefficients  from  a  cross  section  at  time  t^  will  estimate  the 

*  t 

partial  derivatives  of  (J)  with  respect  to  y  ,  holding  6^  constant  at  9^°. 

This  extension  is  of  empirical  use  when  certain  distributional  characteristics 

24 
have  been  observed  as  constant  over  time,    as  the  modeling  process  can  em- 
body this  constancy. 

A  word  of  caution  is  required  for  uses  of  distributional  constancy,  how- 
ever, as  the  choice  of  0^  (the  parameters  held  constant)  is  crucial  to  AS. 
In  other  words,  a  particular  choice  of  L-M  parameters  0^  may  cause  a  violation 
of  (3.16).    This   situation   is  illustrated  by  example  2  of  Section  4. 
In  short,  the  validity  of  AS  depends  on  which  set  of  distributional  parameters 
is  assxjmed  constant. 

As  a  practical  matter,  this  problem  is  of  small  import  when  V  represents 
all  available  distribution  data  over  time.   All  results  must  be  necessarily 
prefaced  by  "holding  all  unobserved  distribution  parameters  constant."   Although 
not  always  explicitly  stated,  this  is  a  requirement  of  virtually  all  empirical 
studies  of  macro  functions. 

However,  this  consideration  does  point  out  two  ways  OLS  slope  regression 
coefficients  from  a  cross  section  can  fail  to  describe  the  macro  function. 

First  is  the  failure  of  AS,  with  H  7^  0,  giving  for  large  N  that  x  has  a  different 

— t       *  t 

functional  relationship  to  V  than  (t  does  to  y  .   The  second  is  when  there  are 

V 

additional  distribution  parameteis0o  which  vary  over  time,  influence  the  mean 

t  -t  25 

of  X  ,  and  are  not  captured  by  V  movements . 


20  - 


4.   EXAMPLES  AND  THE  RELATION  TO  PREVIOUS  AGGREGATION  APPROACHES 

This  section  presents  examples  which  illustrate  the  main  theorems  and 
notation,  and  connects  the  results  here  to  previous  aggregation  approaches. 

Example  1:   Here  f (A  )  is  assumed  to  be  a  linear  function  in  v(A  );   i.e. 

n  n 

x^  =  a„  +  a'v(A^)  +  e (A*)  (4.1) 

n    °        n       n 

t  2(. 

where  a„  is  a  constant,  a  is  a  M-vector  of  constants  and  e (A  )  is  a  residual, 

n 

with  mean  0  and  uncorrelated  with  v(A  ).  I 

n  -^ 

Under  our  assumptions  we  have : 

*   t       t  t 

(J)  (y  )  =  E(x  )  =  a„  +  a"y 


V  (})  =  a 

y 

V 


,— 1 1— t         ^-t 
X  =  E(x  |V  )  =  a^  +  a  V 


V— X  =  a,   G(y  )  =  plim  V— x  =  a 
V  v     ,     V 

4)   {i/;,y  )  =  a„  +  a'(Jj 


G**(i;;,y^)  =  V^<t>**   =  a 


H(i|^,y^)  =  V  <()**  =  0 


Clearly  the  OLS  slope  coefficients  from  a  cross  section  will  consistently 
estimate  a,  either  by  usual  least  squares  theory  or  by  our  general  development. 


The  linear  functional  form  (4.1)  eliminates  all  distribution  parameters  in  <^ 
other  than  y  . 

V 

This  simple  linear  functional  form  has  appeared  in  two  extended  forms 

27 
in  the  economics  literature.     The  first  is  the  exact  aggregation  format  of 
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of  Gorman  (1953),  Muellbauer  (1975,1977;  and  Lau   (1980)  ,    where  the 

constant  coefficients  a^  and  a  are  allowed  to  be  time  varying  with  re- 

t  28 
spect  to  a  common  set  of  parameters  y    .  Thus,  a^,  and  a  of  (4.1)  are 

replaced  by  a^ (y  )  and  a (y  )  to  give 

x^  =  a^iy^)    +   a(Y^)'v(A^)  +  e(A^)  (4.2) 

n    °  '  n       n 

Our  results  show  that  slope  coefficients  from  a  cross  section  at  time 
t  =  to  will  consistently  estimate  a (Y  °). 

The  form  (4.2)  arises  from  the  existence  of  aggregate  macro  func- 
tions which  are  independent  of  the  underlying  distribution  form.   More 
specifically,  Lau (1980)   proved  the  following  important  and  general 
theorem,  summarized  in  our  notation  as:   Suppose  that  for  all  underlying 
configurations  of  {a  ,n=l,...,N  },  x   can  be  written  as 

X  =  F(Y  ,  g^^  (A  , ..  .  ,A^^)  ,...,gj^(A^,  ...  ,A^t))    (4-3) 

where  g  ,  m=l...,M  are  symmetric  functions  of  A, ,... ,A  ..   Then,  under 
m  1      N^ 

some  general  conditions  we  must  have 

t      t     — t 
i)   g  (A  .. . ,  ,A^,^.)  =V  ,   m=l,...,M 
ml      N^     m 

ii)   x^  =  a„(Y''^)  +  a(Y^)'v(A^)  (4.4) 

n    °  n 

iii)   ^  =  ajY^)  +  a(Y^)'  V* 


With  no  distributional  restrictions,  the  form  (4.3)  requires  the  symmetric 

functions q   to  be  averages,  and  x  =  f(A  )  must  be  a  linear  function  (with 
m  n      n 

constant  coefficients  given  t)  in  the  components  of  the  g   functions.  Thus, 

m 

a  linear  function  is  required  for  aggregation  schemes  free  of  distribution 
restrictions. 
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The  second  extension  of  the  simple  linear  model  (4.1)  is  the  consistent 
aggregation  approach  of  Theil  (1953,  1975),  where  the  fixed  coefficients  a„ ,  a 
are  replaced  by  coefficients  which  vary  randomly  across  the  population,  inde- 
pendently of  the  predictor  variables  v(A^) ,  and  have  constant  means  over  time. 
Thus,  (4.1)  is  extended  as 

x^  =  f(A^)  =  a„(A^)  +  a(A^)'v(AS  +  e  (A^)  (4.5) 

n      n     °   n       n     n       n 

where  a  (A  )  is  a  scaler  random  variable  and  a (A  )  is  a  random  M-vector 
on  ri 

t   29 
which  both  vary  independently  of  v(A  ).     Denoting  the  (constant)  coef- 

t  I  t  t  i„t    .    30 

ficient  means  as  a  =  E(a  (A  )  6  )  and  a  =  E(a(A  )  6  )  gives 
o      o  n  n 

*  (yS  =  a„  +  a'y^ 

V  d)  =  a 
\ 

X  =  a„  +  a"v^     V-5  =  a  ,  G(y  )  =  a  (4.6) 

o  V  V 

** 

(J)   (ip,y^)  =  a„  +  a  4j 

**  ** 

G  {\li,M^)   =  V^(})   =  a 

H((i;,y^)  =  V  (j)**  =  0 

V 

Thus  OLS  slope  coefficients  from  a  cross  section  will  consistently  estimate  a, 

the  mean  of  the  marginal  coefficient  distribution.   This  framework  embodies  two 

types  of  assumptions,  the  linear  functional  form  assumption  of  (4.5),  and 

the  partial  distribution  assumption  that  a (A  )  varies  independently  of  v(A^). 

Exact  and  consistent  aggregation  formats  can  easily  be  combined  into  a 

general  linear  model,  allowing  random  coefficients  which  vary  independently 

of  v(A^),  and  whose  means  vary  over  time.   This  specification  is  given  as 
n 

x''^  =  a  (A^)  +  a(A^)'v(A^)  +  e  (A^)  (4.7) 

n    °   n       n     n       n 
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where  a„ (6  )  =  E(a.(A  )|e  )  and  a(0  )  =  E(a(A  )|6  )  are  time  varying.   If 

t  31 
a^  and  a  do  not  vary  with  y  ,    then  AS  holds,  and  our  previous  arguments 

establish  the  validity  of  our  previous  theorems. 

Our  next  example  illustrates  where  AS  holds  but  does  not  rely  on 

linearity  of  f  in  v(A  ) . 

n 

Example  2:   Suppose  for  simplicity  that  x(A  )  =  A  ,  a  scalar  random  variable 

n     n 

t  2  2 

distributed  normally  with  mean  y   and  variance  O^   at  time  t,  where  O^    is 

constant  over  t.   Suppose  that  the  true  functional  relationship  is  quadratic 

in  A  . 
n 

(4.8)       x^  =  f(A^)  =  a„  +  a  A^  +  a^(A^)^  (4.8) 

n      n     °     In    2   n 

where   a^ ,    a     and  a     are   constants.      Using  normality  we  have    (with  V     =  — —  ) 


N 


*      t  t  t   2  2 

(})    (y    )    =  a„  +  ajj     +  a2(y    )      +  a^O ^ 

*  t 

V  4   =   a,    +   2a  y 
y  1  2 

x  =  E(3^|^)    =   a,    +  a  V^  +  a    (^)^   +  a     ^-^^     o^ 

N 

V— X  =   a  +2a  V 
V  12 

t  ~  t 

G(y    )    =  plim  V— x  =  a  +2a  y 

**  t  2  2 

4)      {\l),\i    )    =  a„    +  a^i)     +  a^i)     +  a^o^ 

G      (ijj,y   )    =  V  (J)     =  a^+2a^\l) 
H(i|;,y^)    =  V  (j)        =0 
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t      t 
Our  results  state  that  regressing  x  °  on  A  °  in  a  cross  section  gives  a 

*   32 
slope  coefficient  which  consistently  estimates  V  ^    . 

This  example  illustrates  the  second  extension  of  Section  3.2.   Here 

t      2 
there  are  two  distributional  parameters  y   and  O^,    and  AS  holds  considering 

2  t  * 

Cg  held  constant.   If  O^    is  not  constant  (denoted  o^)    then  V  ({)  only  gives 

*  t 

the  partial  derivative  of  (})  with  respect  to  y   only,  and  thus  captures  only 

* 
a  part  of  the  change  of  cj)   from  distribution  movements.   Alternatively,  if 

T   =  cr^/y   (the  coefficient  of  variation)  is  the  distributional  aspect  held 

t 
constant,  then  reparameterizing  the  normal  distribution  in  terms  of  y  ,  T 

-,   ^  t   ,  t  2   . 
instead  of  y  ,  (o^)      gives 

*t  t       t2      2t2 

(})  (y  ,T)  =  a„  +  a^y  +  a^  (y  )   +  a^T  (y  ) 

*  t        2  t 

V  (J)  =  a^  +  2a2y  +  2a2T  y 

,-t,      ,  — 1,2       2,  t,2  N   -  1 
X  =  a„  +  a  (V  )  +  a  (V   )   +  ax  (y  )   — - — 

■^  N 

V-  X  =  a,  +  2a  V*^ 
V      1     2 

**     t  2       2   t  2 

(j)   (ijj,y  )  =  a„  +  a^i|i  +  a^i]^  +  a^x  (y  ) 

G(iJ;,y''')  =  V  <t>       =  a^  +  23^11^ 

,  ,   t,    ^    2  t 
H(iJ;,y  )  =  2a2X  y 

Here  AS  does  not  hold,  and  thus  the  conclusion  of  Theorem  3.4  is  invalid 


33 


Thus,  AS  depends  on  exactly  which  distributional  aspects  are  assumed  constant. 

2 
The  power  of  the  results  in  example  2   (with  O        constant)   arises 

t  -t    ^^^ 

from  the  normality  assumption  on  A  .   In  particular,  V  =  — -^  is  a  sufficient 

statistic  for  y  in  the  usual  statistical  sense.  More  generally,  the  observed 
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M  vector  V  is  sufficient  for  6   if  the  conditional  distribution  of  A,  , .  . .  ,A  4. 

1      N^ 

— t  t  — 

given  V  is  independent  of  6  :   formally  if  P  represents  the  conditional 

distribution:  t 

"^      t|  t 

;[,  p(A    0  ) 

?(A^ A^^J^=l|.)  =2=1 0 ,.   ^  =  ^ 

=  0  V^  5^  1|J 

^t   t  — t        -^ 

where  P (V  ,9  )  is  the  marginal  distribution  of  V  ,  then  V  is  a  sufficient 

t    —  t  34 

statistic  for  9   if  P  does  not  depend  on  0  .     Clearly  in  this  case  x  = 

Eix    |v  )  depends  only  on  V  ,  and  so  V  is  sufficient  for  deteirmining  x  as 

in  Definition  2  of  Section  3.   In  this  case  AS  holds  for  an  arbitrary  micro 

relation  f  in  accordance  with  Assumptions  A1-A5. 

The  theory  of  sufficient  statistics  is  motivated  by  the  question  of  when 

a  particular  set  of  statistics  captures  all  of  the  information  from  a  sample 

relevant  to  the  distributional  parameters  9  .   As  such,  it  is  a  theory  of 

aggregation  in  the  same  sense  as  the  linear  exact  aggregation  theory  of  econo- 
35 


mi 


cs.   A  major  theorem  in  the  statistical  literature  proven  by  Koopman  (1936), 


Darmois  (1935)  and  Pitman  (1936)  states  under  some  regularity  conditions  that 

a  sufficient  statistic  n (A  , '\[t)  fo^  ^   of  dimension  M  <  N  exists  if  and 

only  if 

,t 


N 

I 
n=l 


r){A^,...,A\)    =         ^  v(A^)  =  nV  (4.10) 

1      N'-      '",    n 


i.e.  T]   is  a  sum  of  functions  of  the  individual  A  and  the  distribution  p(a|9  ) 


36 
has  the  form 


p(A|e^)  =  C(9^)h(A)exp 


M 

y  7T  (9  )V  (A) 
^,  m     m 
m=l 


(4.11) 


Distributions  of  the  form  (4.11)  comprise  the  exponential  family  of  distri- 
butions. Notice  the  similarity  of  exact  aggregation,  requiring  a  linear 
f  structure,  and  sufficient  statistics,  requiring  a  linear  structure  for 
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In  p  as  in  (4. 11) . 

In  the  discussion  of  examples  1  and  2  above,  we  have  reviewed  two  sets 
of  aggregation  assumptions  which  embody  completely  different  restrictions 
on  the  individual  function  f  and  the  distribution  p.   The  first  is  exact 
aggregation,  which  requires  f  to  be  a  linear  function  of  v(A  ),  with  no 

explicit  distribution  assumptions.   The  second  uses  sufficient  statistics, 

— t  nt 

in  requiring  p  to  be  such  that  V  is  a  sufficient  statistic  for  8  ,  with  no 

explicit  assumptions  on  f.   Both  of  these  sets  of  aggregation  assumptions  im- 

— t  — t 

ply  AS,  and  therefore  asymptotic  sufficiency  of  V  in  determining  x   can  be 

viewed  as  a  generalized  aggregation  assumption.   In  addition,  exact  aggrega- 
tion and  sufficient  statistics  represent  pol^r  extremes  under  which  AS  holds, 
as  shown  by  the  following  theorem  and  corollary: 


Theorem  4.1:   Under  Assumptions  Al,  A2,  A4,  A5  and  the  regularity  conditions 
presented  in  the  Appendix,  we  have 

y      v 

V 

where 

»t   -t   ~ ,,   t  t. 
S     =  X  -  x(i[),y  ,N  ) 

K^  =    I    V      m  p  (A^ly^)  -  E(  I  V   inp(A^|y^)Iv  =  ^) 

n=l   V  n=l   V 

Also 

trt|-t 


H(ii;,y'*')  =  lim  E(K   <S^|v^  =  ^) 


^     Nt-x» 


Proof:   See  the  Appendix. 
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The  regularity  conditions  referred  to  in  the  statement  of  Theorem  4.1 
just  insure  that  derivative  operators  may  be  passed  under  the  integral 
used  in  defining  x.   The  following  corollary  is  immediate. 


Corollary  4.2:   Under  the  conditions  of  Theorem  4.1,  V  is  sufficient  for 

--t    — t      V  *   t|  t 

determining  x  if  x  and   )   V   In  p  (A  y  )  have  zero  covariance  conditional 

^   y        n'  V 

-t  t         ''^^   "^  t 

on  V   for  all  y  e$.   AS  holds  if  this  covariance  converges  to  zero  as  N  -><» 

V 

for  all  y  £<!>. 

V 


6   =  0,  or  X  =  x(V  ,y  ,N  ) ,  holds  for  an  arbitrary  distribution  form  p  if 

V 

and  only  if  f(A  )  is  a  linear  function  of  v(A  ) ;  i.e.  the  conditions  for 
n  n 

exact  aggregation  hold.   This  follows  by  a  straightforward  application  of 
Lau's  Theorem  (Lau  (1980)).   Similarly  Z      =0  corresponds  to  the  case  where 
p  is  of  the  exponential  family  form  (4.11),  with  V  a  sufficient  statistic 
for  6  .    In  this  sense  exact  aggregation  and  sufficient  statistics  represent 
polar  extreme  sets  of  assumptions  under  AS. 

Aggregation  assumptions  making  partial  functional  form  and  distribution 

— t 
assumptions  obey  AS  if  and  only  if  V  effectively  determines  all  interaction 

-t  ^y  *   t,  t 

between  x  and  the  gradient  of  the  log  likelihood  function   )  V    In  p  (A  y  ) . 

n=l  y^        n'  v' 

The  zero  covariance  required  by  Corollary  4.2  thus  gives  the  correct  trade-offs 

between  making  functional  form  assumptions  and  distribution  form  assumptions 

under  AS.   In  this  way,  the  consistent  aggregation  approach  of  Theil   relaxes 

the  constant  coefficient  feature  of  exact  aggregation  models,  and  appends  the 

assumption  of  random  coefficients  which  vary  independently  of  the  predictor 

variables  v(A  ) . 
n 

In  order  to  further  illustrate  Theorem  4.1,  consider  the  following  example 
motivated  by  the  standard  errors-in-variables  model: 
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Example  3:   Suppose  that 

X  =   6u(A  )  +  s(A  ) 
n        n       n 

and         V  (A  )  =  u  (A  )  +  r  (A  ) 
n       n       n 

where  u(A  ),  s (A  )  and  r(A  )  have  independent  normal  distributions  with 
n      n         n 

E(u(A*^))  =  y^,  E(s(A^))  =  E(r(A^))  =  0,  Var(u(A^))  =  a  ,  Var(s(A^))  =  a^' 
n  n  n  nuns 

t       2       2   2      2 
Var(r(A   ))  =  a  ,  and  a  ,  a   and  a   are  assumed  constant  over  time.   Our 
n       r       u   r      s 


aim 


tit*t  t  titt 

is   to  study  E(x     y  )  =  (J)    (y    )    =   By      as  a   function  of  E  (v(A   )    y    )    =  y    . 
n  n 

We  have   that 


x^  =   6v(A^)    -   6r{A^)    +   s(A^) 
n  n  n  n 


and  so  (using  normality) 


x(v*',y*',N^)  =  (3  -  3X)v^  +  3Ay^ 
V-  S  =  3  -3A  =  G(y^) 
(i)**(i|^,y^)  =  (3  -  3X)if;  +  ^X\i^ 
G**{jp,v)   =  V  (})**  =  3  -  3A 

uii),]!^)   =  V  4)**  =  3A 

2       ^- 
a 

where  A  =  — — -.   Unless  A  =  0  (3  =  0  is  ruled  out  by  Assumption  A4) ,  AS 

does  not  hold.   Corresponding  to  this  is  the  familiar  result  that  plim  b  = 

/\ 

3(1  -  A)  7^  3/   where  b  is  the  OLS  slope  coefficient  obtained  by  regressing 

K 

H  on  v(A^)  in  a  cross  section.   In  accordance  with  Theorem  4.1,  we  have 
6^  =  3A(v*'  -  y^)  -  3?"  +  r^ 


and 


?  =  -^(^(v  -y)-r) 

a 
u 


-t  -t 


with  r  ,  s  defined  as  the  appropriate  averages,  and  we  can  easily  calculate 
This  illustrates  the  result  of  Theorem  4.1. 


Consistent  and  exact  aggregation  schemes  directly  imply  a  linear  macro 

*      t 
function  (})   in  y  .   Aggregation  schemes  using  sufficient  statistics  rely 

wholly  on  assumption  on  p,  and  can  be  consistent  with  both  linear  and  nonlinear 

* 
(|)   formulations.   In  the  next  section  we  show  how  additional  cross  section 

*   t 
moments  can  be  used  to  estimate  the  derivatives  of  cf  (y  )  of  all  orders,  when 

V 

the  distribution  p  is  of  the  exponential  family  form  (4.11).   Through  this 
development  a  general  test  of  linearity  (consistent  or  exact  aggregation) 
emerges,  which  relies  only  on  our  basic  population  assiamptions. 


5.   SUFFICIENT  STATISTICS  AND  MACRO  FUNCTIONS 

In  this  section  a  methodology  is  presented  for  estimating  second-order 

derivatives  of  the  macro  function  with  respect  to  y   from  cross  section 

v 

moments,  when  p  is  a  member  of  the  exponential  family  (4.11).   This  methodol- 

i 

ogy  amounts  to  repeated  application  of  derivatives,  and  extends  to  derivatives  of  (}) 
of  all  orders. 

We  begin  by  adopting: 
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ASSUMPTION  A6:   p  is  a  member  of  the  exponential  family  in  its  natural 
parameterization 


(  I  ttK    (A)) 

\  ^T  m  m   / 
^  m=l       ' 


P(a|tt  )  =  C(TT  )h(A)  expl  I   TT^v  (A))  (5.1) 


where 


C(TT^)  =M  h(A)  exp  (  ^   %V^^^0  '^ 
\  m=l    "^    ' 

and  where  0   has  been  reparameterized  by  tt   =  (tt  ,  . .  .  ,tt  ). 

1       M 

6   of  (4.11)  has  been  replaced  in  (5.1)  by  the  coefficients  tt  (6  ) ,  m=l,...,M; 

m 

here  considered  as  independent  parameters.   (5.1)  holds  without  loss  of 

generality  from  (4.11)  if  the  mapping  6  -^    (tt,  (6  )  , . .  .  ,tt  (9  ))  is  of  full  rank 

1  M 

M.   Thus,  Assumption  A6  just  eliminates  constraints  across  tt  (0  ) ,  m=l,...,M, 

m 

which,  from  an  empirical  point  of  view,  are  unnecessary  at  the  outset. 
Two  useful  textbook  facts  about  the  form  (5.1)  are: 


Lemma  5.1:   Under  Assximption  A6,  the  natural  parameter  space 

r  =  (tt  |p(a|tt  )  is  a  density}  is  convex. 
Lemma  5.2:   If  i|;(A  ,  ...,A  •(-)  is  a  function  for  which  the  integral 


t      t .  "C. .  t. 


/•  ••/!(;( A.  ,...,A^,t)   n  h(A  )  exp 
1      N'-    ,    n 


M 

'^   m    mN 


^^1 9^Nt 


n=l  Lm=l 

exists  for  all  TreF,  then  this  integral  is  an  analytic  function  of  tt  at  all 
interior  points  of  F,  and  derivatives  of  all  orders  with  respect  to  tteF  may 
be  passed  beneath  the  integral  sign  (for  discrete  exponential  families  this 
integral  is  replaced  by  a  sum. ) 
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Proofs  of  these  lemmae  can  be  found  in  Lehmann  (1959) .   They  allow  a  com- 
putational method  for  taking  derivatives  of  various  expectations. 
Recall,  as  in  earlier  sections,  that  we  denote 

(j)(^^)  =  ECx'^l^'*') 

and  that 

-1 


C{tt'^)  = 


/"h(A)exp  y  IT  V  (A)  9a 


C(Tr    )    appears   in    (5.1)    as    just   a  normalizing   factor   to  make  p(Al7T    )    a 
density.      Both   (^   and  C  have   some   remarkable  properties,    however,    as   shown 
in  the    following   lemma: 

Lemma  5.3:  Under  Assumption  Al  and  A6,  all  derivatives  of  ^  and  In  C  with 
respect  to  it  are  expressible  as  functions  of  moments  of  the  x  ,v(A  )  dis- 
tribution.     In  particular,   we  have   for   C  that 

81nC       „,       ,,,  I    t,  t 

=  E(v    (A)    IT   )    =   y    ,      m=l,...,M 


Btt  m         '  m 

m 

2 

-    I   -'•"^       =  E((v    (A)    -   y*)  (V    .(A)    -   /.)  |/  =   a^    .,      m'=l,...,M 

dTT    9tt    ^  m  mm  m  mm 

m     m 

2 

and  -    -    ^^^^^r,     =  E((v    (A)    -    y^)  (v    .(A)    -    y^.)  (v.  (A)    -   y^)  [tt^) 

dTT   dTT    ^oTT.  m  mm  m  x-  x, 

mm        X, 

=   a      ^„     ,     m,m'',   £=1, . .  .  ,M 
mm   X, 

For    (j)  we  have 


-|^  =  E((x  -    (})    (^^)  )        (V    (A)    -    y^)  [/)    =   o^ 
3iT  m  m  xm 

m 

2 

and  -^-J =  E((x  -    (|)   (TT^)  )        (V    (A)    -    y^)  (v    .(A)    -    y^ .)  ]/) 

a-n   87T    ^  m  mm  m 

m     m 

=    0        ,,      m,m     =1,...,M 
xmm 
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Proof:   The  first  statement  follows  from  Proposition  5.2.   The  formulae 

38 
are  obtainable  by  direct  computation. 

^  QED. 

V-Je  are  primarily  interested  in  the  behavior  of   <j>('n"  )  with  respect 

to  changes  in  y  .   We  proceed  as  before  to  reparameterize  via  the  mapping. 


y   =  E(v(A)  ItT  )  =  g(7T  ) 
V 


(5.3) 


In  view  of  Lemma   5.3,    this  mapping   is   expressible   as 


t  t  t 

y      =   -V   4-    lnC(TT    )    =  gClT    ) 
v  1T^ 


(5.4) 


We  can  reparameterize  the  distribution  (5.1)  in  terms  of  y   if  the  mapping 
g  is  invertible;  i.e.  if  the  differential  (Jacobean)  matrix  dg   is  non- 
singular.   This  matrix,  again  from  Lemma  5.3,  can  be  written  as 


(5.5) 


t  ^  /_  9^1nC    \  ^  ^  t 
\   m  m  TTt/ 


dg  = 


the  covariance  matrix  of  v(A  ).   Thus,  under  AssxjmptionsAl  and  A6, 

n 

Assumption  A2  is  guaranteed.   We  therefore  form 


t    -1,  t^ 

■n      =  g      (y  ) 

V 


P*(AlyS  =  p(Alg~^(yJ)) 


(5.6) 


and 


(})*(y^)  =  (})(g  -'■(y*')) 

V  V 


Under  the  additional  assumption  A6,  we  can  show  the  main  result  of  Section 
3.1  by  direct  computation. 

Theorem  5.4:   Under  Assumptions Al  and  A6  the  gradient  of  (})  with  respect  to 


y  is 

V 
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*   t      t   -1  t 

V  <};  (y^)  =  (Z^  )  ^l^ 
y     V      w    XV 

V 

and  so  is  consistently  estimated  by  micro  slope  regression  coefficients  from 
a  single  period  random  sample  cross  section. 

Proof:   By  the  chain  rule 

V  4)*  =  (dg^)""'"V  (J) 

V 

Now      (dg  )    =  (Z   )    and  by  Lemma  5.3,    V  ({)  =  I 

W  TT       XV 


QED. 


* 
We  can  similarly  calculate  all  higher  order  derivatives  of  (f)  with 

respect  to  u   as  functions  of  moments  of  the  x  ,  v(A  )  distribution.   Be- 
V  n     n 

cause  these  calculations  increase  greatly  in  complexity  as  the  order  of 

the  derivatives  increase,  we  present  only  the  second  derivative  calculation. 

We  first  require  some  new  notation  to  facilitate  the  formulae: 

t  ^  t  „ 

Q„        denotes  the  M  X  M  matrix  with  m,  m  element  0„      ^,il=l,  —  ,M 
£7T7T  x,mm 

n   denotes  the  M  X  M  matrix  9.        =    [9,^       /  •  •  • /fi,-   ] 

7T7T  TTTT  ItTTT  MtTTT 

t  ^  t 

Z    denotes  the  M  X  M  matrix  with  m,  m  element  a 
xw  xmm 

and 

D^  =  (Z^  )~"^Z^   (Z^  )~"^-(Z^  f-^  fi   [(Z^  )~^   Z^  ^  (Z^  )"-^]    (5.7) 

W      xw   W         VV       TTTT     W       XV       W 

We  can  now  show 


Theorem  5.5:   Under  Assiamptions  Al  and  A6,  the  matrix  of  second  order 

* 
partial  derivatives  of  ^     with  respect  to  y   evaluated  at  period  t  is  given  as 
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V7    2,*  t 

V      (^      =   D 

V     (p      IS   the  M  X  M  matrix  with  m,  m     element     t ^ 

^  mm 


y 


The  proof  is  by  direct  computation,  with  a  sketch  of  it  presented  in  the 

Appendix. 

* 
The  formula  (5.7)  for  the  second  derivatives  of  ({)   is  sufficiently  com- 
plex to  warrant  illustration  by  a  simple  example.   Suppose  that  M  =  1, 
or  that  A  is  distributed  according  to 

P(aI-r  )  =  c(Tr  )h(A)  exp  (tt  v  (A)  ) 

where  tt   is  a  sca]ar  parameter.   Here  no  assumption  is  made  on  the  micro 

functional  form  x  =  f(A  ),  other  than  its  expectation  exists.   We  have 
n      n 

therefore 


E(x|tT  )  =  (|)(TT  )  =   <i>     (y  ) 


In  accordance  with  Theorem  5.4,  we  find  that 

where  b  is  the  estimated  coefficient  from  the  cross  section  regression 
K 

X,    =   a   +  bv,  (A,  )  .         Now 
K  Ik 


2    *  2 

8  4)     ^  9^ 


te) 


9^  3fiT_ 
9tt   9y   2 


I 


(5.8) 


By  Lemma   5.3,    we   have 


„     2  Xll     '         d]J,  0,,t'     dTT 

dTT  1  11 


9Tr  19*^*^ 

xl 


cind   so  we  must   find 


'^2 
9    TT 

9y^2- 


since 
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^^        37r 


3tt       8y 


=  1 


by  differentiation  with  respect  to   y     we  get 


or 


3.2  v^/ 


3  TT        37T   \   1/  111 


(5.9) 


^2         3y^  /    t,  3 


Inserting  these  values  into  (5.8)  gives 


t        t    t 
3  9  _  xll   _   xl   111 


which  agrees  with  D  of  (5.7)  for  M  =  1. 


* 

As  we  have  shown,  we  can  express  the  second  order  derivatives  (f> 

in  terms  of  moments  of  the  underlying  exponential  family  population 

density.   This  holds  for  arbitrary  micro  functional  forms  x  =  f (A  ) 

n      n 

obeying  Assumption  Al.   Estimating  these  moments  by  their  sample  counter- 
parts in  a  cross  section  data  base  allows  consistent  estimation  of 

2   *    t  39 

V  4)  =  D  for  that  time  period.   Asymptotic  inferences  using  these  es- 

v 

40 
timates  are  possible  by  standard  methods.     Thus  m  particular,  we  can 

*  t 

test  whether  6   is  a  linear  function  of  y  . 

V 

The  testing  of  linearity  on  the  basis  of  D  extends  beyond  the  case 
of  sufficient  statistics,  as  shown  in 


Theorem  5.6:   Assume  that  the  moments  defining  D  exist,  and  that  Assumption 
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t      t 
Al  holds.   If  X  =  f(A  )  is  of  the  generalized  linear  form  (4.7),  then 
n      n 

D^  =  0. 


Proof:   See  the  Appendix. 


Thus,  asymptotic  inferences  on  the  estimate  of  D  can  be  used  to  test 

whether  a  generalized  linear  form  aggregation  model  is  consistent  with 

41  t 

a  cross  section  data  base.   In  particular,  if  D  =  0  is  rejected,  then 

the  generalized  linear  form  is  rejected  as  inconsistent  with  the  micro 
data.   Notice  that  this  property  relies  on  extremely  weak  underlying  assump- 
tions, namely  the  existence  of  the  moments  required  by  Assumption  Al,  the 
construction  of  D  ,  and  the  application  of  the  Central  Limit  Theorem  to 


the  sample  moments  used  in  estimating  D  . 
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6.   CONCLUSION 

The  first  major  result  of  this  paper  is  that  micro  slope  regression 
coefficients  will  consistently  estimate  the  first  derivatives  of  the  true 
macro  relation  if  and  only  if  AS  holds.   The  AS  property  is  seen  as  a 
generalized  aggregation  condition,  embodying  both  linear  aggregation  assump- 
tions and  assumptions  for  sufficient  statistics,  as  well  as  providing  the 
relevant  structure  for  partial  functional  form  and  distribution  form  assump- 
tions. 

In  addition,  we  have  shown  that  if  the  predictor  averages  are  sufficient 
for  the  underlying  population  parameters,  then  in  principle  (when  the  popu- 
lation density  is  a  member  of  the  exponential  family)  one  can  empirically 
characterize  macro  function  derivatives  of  all  orders  using  cross  section  data, 
making  possible  a  test  of  a  linear,  quadratic  or  some  higher  order  nonlinear 
macro  function.   These  techniques  extend  to  provide  a  general  test  of  linear 
aggregation  schemes,  such  as  the  consistent  and  exact  aggregation  models. 

The  main  appeal  of  these  results  is  that  they  make  possible  an  empirical 
characterization  of  macro  functions  using  micro  data,  without  restrictive 
modelling  assiamptions  (besides  AS) .   In  addition  even  if  the  true  macro  function 
is  linear,  the  independent  effects  of  the  average  variables  over  time  may  be 
difficult  to  identify  because  of  trending  behavior  or  other  data  problems 
(referred  to  as  multicollinearity) .    In  this  spirit,  a  first  order  approx- 
imation of  the  true  macro  relation  using  average  and  cross  section  data  is 
provided  by  an  exact  aggregation  model ,  as  the  estimates  obtained  from  each 
data  source  will  coincide  in  large  samples,  and  allow  the  analyst  to  take 
advantage  of  the  increased  data  input  by  increasing  the  precision  of  the  final 
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estimate  values.   Moreover,  the  exact  aggregation  scheme  can  easily  incorporate 
structural  change  as  indicated  by  additional  cross  section  data  sources. 

The  techniques  given  here  can  provide  additional  insight  into  the  dis- 
tributional influences  on  macroeconomic  relations.   Hopefully  they  will  help 
end  the  practice  of  neglecting  such  issues,  a  practice  which  is  now  so  prevalent. 


i 
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Appendix:   Omitted  Proofs 

Proof  of  Lemma  3.1  b) 

Lemma  3.1  b)  is  shown  as  the  result  of  combining  Lemma  3.1  a)  with  two 

other  propositions,  the  first  is  shown  in  Rao  (1973)  Section  6.2  a; 

Lemma  AP . 1 

Let  T   be  an  M  dimensional  statistic  (T, „,..., T  )^   such  that  the  asymp- 
N  IN       MN 

totic  distribution  of  v^  (x,   -  y ^) , . . . ,/n    (t    ~  Y  )   is  M-variate  normal 

IN     1  I4N     M 

with  mean  zero  and  variance  covariance  matrix  E  .   Further,  let  g(T,  , . .  .  ,T   ,  N) 

T  IN       MN 

be  a  function  which  is  totally  differentiable  in  T,  , .  . .  ,T   ,  and  that 

IN       MN 

V   g  ->  G  7^  0  as  both  N-x»  and  T  ->  (Y  , . . .  ,y    )^      =  Y.   Then  the  asymptotic  dis- 
T„  N   ' 1      ' M       ' 

N 

tribution  of 


v¥(g(T^^,...,T^,N)  -  g(Y-^,...,Y^,N)) 


is  the  same  as  that  of 


»^(T^  -  Y)'G 


that  is,  normal  with  mean  zero  and  variance 

(TI   G 
T 


* 


Moreover,  g  (y^  » .  •  • /Yw'N)  may  be  replaced  in  the  above  by  g  (Y-,  f  •  •  • 'Y„)  if 

lim  v/N[g(Y,,...,Yw,N)  -  g*(YT,...,Yw)J  =  0 
N->oo       X      11  ±      n 

Lemma  AP . 2 

_  * 

limv^  (x  (y  ,y  ,N)  -  4)  (y  ))=  0 

V   V  V 

N-^oo 

Proof:   Fix  N  and  consider 

E[v^  {x  -  <t)    (y  ))  +  v^  (x(v  ,y  ,N)  -  x(y  ,y  ,N))] 

V  N   V  V   V 
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=  E(v^(x  -  x(V,y  ,V,y  ,N)))  +  )^(x(y  ,y  ,n)  -  *  (y  ) ) 

V      V  V   V  V 

=  0  +  v¥(x(y  ,y  ,N)  -  (t>  (y   )) 

V  V  V 

Now  as  N-x»,  the  first  expectation  approaches  zero  by  virtue  of  Lemma  3.1  a) 
and  Lemma  AP.l  applied  to  x.   Thus 

lim  /n  (x(y  ,y  ,N)  -  *  (y  ))=  0 

V  V  V 

N->oo 

QED 
Applying  Lemma  AP.l  to  x  in  view  of  Lemma  3.1  a)  and  AP.2  gives  Lemma  3.1  b) 


Additional  conditions  for  Theorem  4.1: 

-  t      t     t      np*(A^|y^) 
Let         P(A  ,  ...,A^tl'l^'^  )    =  ^t  ^      '      ^  =  '^ 

V 

be  the  distribution  of  A,,..., A  ±.   conditional  on  V  =  '!'•   Let  e.  be  the  M- 

1      N^  ^        1 

th 
vector  with  i   component  1  and  all  other  components  0,  i=l,...,M.   Assume 

*  t 

p  and  P  are  dif ferentiable  with  respect  to  each  component  of  y  ,  and  that 

the  difference  quotients. 

i)   r(P(*I'l''Vi   +  e.h)  -  p{'\i),\i    )) 
h        VI  V 

ii)   ^  (p(«li|;,y  +  e.h)  -  p('Ii|;,y  )) 
h     '    V    1  V 

are   all  botinded  by   integrable    functions   of  A    ,...,A   ^,         for  0   <    |    h    |    <   h^. 


-    41    - 


Proof  of  Theorem  4.1: 

The  above   conditions   ii)    allow  differentiation  of 


~  ~t      t     t  — 1|— t 

xiV^rV    ,N    )    =  E(x     V   ) 

V 


-t  — ,    t  t    |--t      t,    „    t  „,t 


\inder  the   integral   sign,   which  gives 


V       5  =  E(x^y^|v^,y^) 


where 

t 


N 


T^  =      y     V,      In  p*(A"|y^)    -  V,      In  P(V^|y^) 
^      y  n    V         y  V 

1  V  V 

n=l 


Theorem  4.1  is  shown  if 


E('i'^|v^,y^)  =  0  (AP.l) 

'     V 


By  condition  i)  above,  we  can  differentiate 


1  =  E(l|v^,yS 


under  the  integral  sign,  which  gives (AP.l)  above 


QED 


Proof  Sketch  for  Theorem  5.5 


Denote  the  components  of  tt^  =  g  ^(yj)  by  g"^(y^)  =  (g~^(yj)  , . . .  ,gj"^^(yj)) . ' 


As   in  Theorem   5 . 4 
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* 

V     ({)     = 


94) 
9P, 


3g. 


-1 


d\l. 


3g/ 


i9U 


M 


9g. 


M 


3y. 


3g 


-1 


M 


9y, 


M 


94) 
9u, 


94) 


9tt, 


M 


Therefore 


2    * 

9  4) 


M 


=      ^ 


94) 


9  8. 


-1 


9y.  9y.        ^  ,  9tt 

X       J        m  =1       m 


m 


9y .  9y . 
1    J 


M       M 
m=l  m  =1 


9^4) 


m     m 


3y^ 


9g. 


-1' 


m 


^. 


(AP.2) 


The   second   term  of   the   above    (the   double   sum)    is   expressible   in   full 
matrix   format  as 


3g. 


9g 


-1 


M 


^. 


9y. 


38i 


9g. 


-1 


M 


ay 


M 


9y 


M 


^     2 
9tt^ 


9tt,  9tt,, 
1     M 


9^4) 
9Tr^97r^ 

• 
• 
• 

9^4) 

9g 


-1 


9y, 


8g 


-1 


3Hji 


9g 


M 


3y, 


9g, 


-1 


9y 


M 
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which  by  Lemma  5 . 3  equals 


v-1 


w 


xw 


w 


(AP.3) 


giving  the  second  term  in  the  statement  of  the  theorem.   Now  if  B  = 
is  an  MxM  matrix  of  fionctions  of  y  then  we  denote  by  D  B  the  matrix 


[W] 


D      B   = 

y 


8b       (y) 


dy 


The   first  term  of    (AP.2)    is   expressible   in  matrix   format  as 


D      (dg   ""■)    V  4),    ♦    •    •,    D      (dg   "*")    V  cf) 


(AP.4) 


Now,  in  order  to  evaluate   D    (dg   ) ,  m  =  1 ,  •  •  • ,  M,  we  use  the  relation 

m 

(dg   )   (dg)  =  I   so  that  (if  0„  is  an  MxM  matrix  of  zeros) 

MM 


D    (dg 
m 


or 


"^)  dg  +  dg  ^  /d^  (dg)j=  0 


m  =  1 , . .  .  ,M 


D   (dg'^  =  -dg  "  (D   (dg) 
m  \  '■^m 


9  g. 


m 


Now,  if  g^^^  denotes  the  MxM  matrix  with  i,  j  element  -^ ^  ,  we  express 

i        J 


mTTTT 


D        (dg)    as 

^m 


D 


-1 


glTTTT     II      ^  5        *       *       *J      8v,__     D  g 


'Itttt  y 


m 


Mtttt     \i 


m 


■'] 
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so  that 


D   (dg   )  =  -dg 


m 


-1   .  .  .       p   -1 
m  m 


dg 


-1 


(AP.5) 


The  proof  is  completed  by  inserting  (Ap.5)  into  (AP.4),   making  the  associations 
dg"-*-  =    a   ^)         ;  V  4)  =  E  ^ 

*  W  7T^   "   XV 


g    =   Q,  ;   m  =  1, 


•     •     • 


,  M 


f^i^"'  ■  •  ■■  \''[ 


=  dg 


■'  ■[■-] 


-1 


and  rewriting  the  whole  expression  in  terms  of  ", 


TT7T 


QED 


Proof  of  Theorem  5.6: 

For  th-e  generalized  linear  model  (4.7),  we  have 


Z^   =  Z^  0(0^); 
XV     w 


so 


^    =  [fi,  a(6  ),..., Q,_a(e  )] 

XVV  iTTTT  MTTTT 


D^  =    (E^   )~^[fi,     a(e^), O,     a(0^)](z]i) 

w  Itttt  ^tttt  w 


(Z^   )    -^{Q^   )  (a(e^)  ®   (Z^   )    ^) 

W  TTTT  VV 


since    (E      )      Z^    =  a(6    )•      Now,   by  syitmietries   in  the  construction  of  ^      , 

W  XV  TTTT 

we  have 


W.,      a(O^) 9.^     n(0^)](y.l)~ 

llTTT  Mcii  VV 
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=  n   (a(9^)  (x)  (Z^  )  •"■) 

TTTT         ^    W 


by  direct  computation,  which  gives  D  =  0. 


QED 
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FOOTNOTES 


1.  One  of  the  reasons  Friedman's  book  The  Theory  of  the  Consumption  Function 
is  so  masterful  is  that  the  distributional  foxmdation  is  clearly  stated 
and  investigated  empirically  with  both  macro  and  micro  data,  although  not 
using  pooled  methods  as  advocated  here.   Other  early  works  in  demand  analysis 
which  estimated  income  elasticities  from  cross  section  data  and  applied  them 
to  time  series  analysis  were  Wold  (1953)  and  various  work  of  Stone,  although 
these  authors  did  not  use  aggregating  models  specifically.   A  recent  demand 
application  of  an  exact  aggregation  model  is  Jorgenson,  Lau  and  Stoker  (1979) . 

2.  This   critique  applies  equally  well  to  studies  of  aggregate  variables  such 
as  national  income,  total  personal  consimption  expenditures,  etc. 

3.  This  becomes  a  major  empirical  problem  when  there  are  several  predictor 
variables,  as  then  the  full  (multivariate)  distribution  of  xinderlying  at- 
tributes must  be  characterized.  Moreover,  if  the  cross  section  data  is 
available  for  only  one  time  period,  the  underlying  distribution  is  held 
constant,  and  so  distribution  movements  over  time  cannot  be  captured  by 
this  process. 

4.  See  Theil  (1954,  1975),  Green  (1964),  Gorman  (1953),  Muellbauer  (1975,  1977) 
and  Lau  (1980) . 

5.  The  theory  of  sufficient  statistics  is  presented  in  most  standard  textbooks  on 
mathematical  statistics;  c.f,  Lehman  (1959),  Ferguson  (1967)  or  Rao  (1973). 

6.  p  may  just  be  taken  as  the  density  of  the  sample  distribution  in  the  population. 
However,  with  N  sufficiently  large,  p  may  be  taken  as  a  continuous  approximation] 

to  this  density.   We  utilize  this  framework  in  order  to  allow  structure 
to  be  given  to  the  population  configuration  (a  |n=l,...,N  }  via  p(a|6  ). 
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7.  See  Rao  (1973),  section  2c. 3  for  a  statement  of  the  Weak  Law  of  Large  Numbers. 

8.  Each  index  k  of  random  sample  has  a  counterpart  n  index  in  the  population 
(n=l,...N  )  numbering.   We  utilize  the  k  indices  only  when  discussing  statistics 

of  the  cross  section. 

9.  Typical  numbers  for  a  study  of  U.S.  family  demand  behavior  are  N  =70 
million  for  1972,  with  a  budget  study  of  size  K  =  10,000. 

10.  See  Section  4. 

11.  For  instance,  Jorgenson,  Lau  and  Stoker  (1979)  differentiate  individual 

families  on  the  basis  of  17  income  and  demographic  variables. 

— t 

12.  For  example,  if  V  is  a  sufficient  statistic  for  the  distributional  para- 
meters 6   -  c.f.  Section  4. 

13.  Variables  common  to  all  families,  such  as  prices,  can  be  entered  as  para- 
meters of  f,  as  in  Section  3.2.   If  prices  vary  over  families,  they  should 
be  considered  as  components  of  v(A  ) . 

14.  V— X  represents  the  gradient  of  x  with  respect  of  V,i.e.  the  M  vector  with 

.th 

1   component 


3x_ 

9v. 


v,y  ,N 

V 


15.  Rao  (1973)   section  2c  is  an  excellent  reference  for  these  theorems;  also, 
see  section  6a  for  some  useful  corollaries. 

16.  It  is  useful  to  point  out  that  our  underlying  population  assumptions  give 

b  a  slightly  different  asymptotic  distribution  than  in  the  standard  linear 

K 

model.   In  particular, v^(b  -  G(lJ  ))  approaches  a  normal  vector  as  Kr^ 
with  mean  zero  and  variance  coveriance  matrix 

s.  =  (i*°)~^(i:^°  .  ^  ,){T.^°)~^ 

b     vv      (xv) (xv)    vv 
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where  Z  °  , ,   ,  is  the  matrix  with  mm'  element 
(xv) (xv) 

E[((x*'°  -  cj)  )  (v  (A^°)  -  y^°)  _  a^°)    ' 
m         m  '     xm 

((X^°  -  4)*)  (V  .(A^°)   -  y^o^)  _  g^o^)] 

m  ™       xm 

2  t  -1    2 
Z   will  correspond  to  the  usual  expression  (i.e.  O  (Z  )  ,   O      is   residual 
b  w 

2  t      t 

variance  if  there  is  a  zero  correlation  between  u  and  (v. (A  °)  -  y.°) 

1      k  1 

(v.  (A   °)-y^°)    for   i,  j,  ..  .,1,..  .,M,    where   u     =  x,^°    -  y^°   -    (v(a5°)    -  V^°)'b    . 
DKj  XkKKKK 

Use  of  the  standard  estimators  may  provide  an  adequate  approximation  to  I, 

b 

if  the  sample  counterparts  to  these  correlations  are  small. 

t  t  * 

17.  Suppose  that  x   is  fiinctionally  related  to  v(A  ),  i.e.  there  exists  f 

n  n 

t      t     *    t 
such  that  X  =  f (A  )  =  f  (v(A  )).   A  related  but  different  question  than 
n      n  n 

t 
that  asked  here  is  under  what  conditions  will  plim  b  =  V  f  (y  °i-?  This 

K    v    V 

problem  is  addressed  by  White  (1978) ,  where  relatively  restrictive  conditions 

* 

on  f  and  p  are  found. 

18.  The  definition  of  uniform  convergence  can  be  found  in  Apostol  (1967)  ,  p. 
424  and  Buck  (1965),  p.  180-2. 

19.  This  standard  result  of  analysis  is  available  in  most  books  on  advanced 
calculus,  c.f.  Buck  (1965),  section  4.2  (Theorem  21  in  particular). 

20.  The  only  exception  known  to  this  author  is  Friedman's  permanent  income  - 
permanent  consumption  model.   See  example  3  of  Section  4  (errors  in 
variables)  for  illustration  of  this  fact. 

21.  See  Rao  (1973),  Section  5a. 2  for  the  usual  statement  of  the  Rao-Blackwell 
Theorem.   Note  2  (p.  321-322)  verifies  the  property  referred 
to  here,  as  pointed  out  by  Arnold  and  Katti  (1972) . 


.i 
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22.  With  respect  to  any  convex  loss  function  -  e.g.  minimum  variance.   See 
Rao  (1973) ,  p. 322. 

23.  i.e.  Assumption  A2  is  replaced  by  the  full  invertibility  of  the  function 

y^  =  gO^) 

v 


where  6  =  (6^,  ^o^'-   Inverting  gives 


,t    -1,  t  „t 


t  -1   t   t  * 

and  so  9   can  be  replaced  by  g   (y  ,  9^,)  in  forming  (j)  . 

24.  For  example,  the  stylized  fact  that  the  coefficient  of  variation  of  the 
U.S.  log  -  income  distribution  is  roughly  constant. 

25.  If  (3.16)  is  replaced  by 

lim  X  ii),   y^,  N^,  9^)  =  (})  {^i)        for  all  i> ,   \i^e^   and  all  9^ 

this  second  problem  is  avoided.   However,  this  structure  is  more 
restrictive  than  (3.16)  in  the  text,  and  depends  on  the  precise  role 
of  9g  in  p  and  the   (y  ,  9^)-«->-  (9-j^,  9^)  reparameterization. 

26.  Recall  that  A     is  just  used  to  signify  dependence  on  the  underlying  dis- 

n 

tribution  p(A|9  ). 

27.  The  basic  form  (4.1)  represents  the  "perfect"  aggregation  conditions  of 
Theil  (1953)  and  Green  (1964). 

28.  This  reflects  the  first  extension  discussed  in  Section  3.3. 

29.  Theil  (1953,  1975)  assumes  a (A  )  uncorrelated  with  v(A  ),  which  gives 

n  n 

a  linear  macro  function.   AS  requires  that  squared  terms  involving  the 
components  of  v(A  )  are  uncorrelated  with  a (A  ) ,  and  so  we  assvmie  independence. 
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although  weaker  conditions  may  suffice. 

30.  The  independence  assiomption  allows  the  (derived)  distribution  of 

a. (A  ),  a (A  )  and  v(A  )  to  be  written  as  the  product  of  the  marginal 
n      n         n 

distribution  of  a„ (A  ) ,  a (A  )  and  the  marginal  distribution  of 
°   n      n 

v(A  ).   We  assume  that  the  marainal  distribution  of  a„ (A  )  and  a (A  ) 
n  °   n         n 

has  a  constant  mean  over  time  t. 

31.  That  is,  the  means  of  the  marginal  coefficient  distribution  referred 
to  in  footnote  30  are  determined  by  distributional  parameters  other 
than  9   =  g   (y  ,6o)  of  footnote  23. 


32.   Simple  specification  analysis  techniques  verify  this  formally,  if  one 

t  t 

<  °  =  b„  +  b^  A 

R     °     1  K 


t  t 

estimates  x  °  =  b„  +  b,  A  with  (4.8)  as  the  true  model,  then  plim  b. 
It     °     1   f  1 


a^  +  2a^    . 

t        * 

33.  In  the  notation  of  footnote  32,  we  have  plim  b  =  a  +  2a  u   ?^  V  (f)   = 

V 

a^  +  '2-3^^+  2ei^i:\^. 

34.  For  definitions  and  further  discussions  of  sufficient  statistics,  see 
Lehmann  (1959),  Rao  (1973)  and  Ferguson  (1967). 


35.   Lau  (1980)  mentions  sufficient  statistics  in  some  concluding  remarks. 
However,  his  framework  is  not  general  enough  to  precisely  describe  the 
role  of  sufficient  statistics  in  aggregation,  as  is  done  here.   Actually, 
the  sufficient  statistic  structure  underlies  the  model  in  Houthakker  (1956) 
This  type  of  model,  arrived  at  by  direct  integration  of  a  behavioral  func- 
tion over  a  specific  distribution,  has  appeared  in  several  works,  as 
surveyed  by  Fisher  (1969),  with  a  recent  example  MacDonald  and  Lawrence 
(1978). 
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36.  Briefly,  the  regularity  conditions  required  are  that  the  range  of  variation 

of  A  does  not  depend  on  fl  ,  a  continuously  dif ferentiable  sufficient  statistic 
n 

n   for  6   exists  and  p(A|0  )  is  continuously  diffentiable  in  A  and  9  , 

plus  some  conditions  on  the  dimension  of  possible  variation  in  A  .   Under 

n 

these  conditions  p(a|9  )  must  have  the  form  (4.11)  locally.   If  p(a|6  )  is 
further  assumed  to  be  analytic,  (4.11)  is  the  global  form  of  the  density. 
For  an  excellent  paper  that  proves  this  theorem  in  more  generality  than  that 
needed  here,  see  Barankin  and  Maitra  (1963) . 

37.  The  exponential  family  form  (4.11)  is  quite  general.   Examples  of  univariate 

2 
distributions  expressible  in  this  form  are  the  normal  {\i  ,o    ),  Poisson  (y)  , 

negative  binomial  (r,0),  the  gamma  distributions  and  the  beta  distributions. 

Examples  of  multivariate  distributions  expressible  in  this  form  include 

the  normal  with  mean  y  and  variance  covariance  matrix  Z.   Distributions 

which  are  not  of  the  form  (4.1)  include  the  uniform  and  Cauchy  distributions. 

See  Ferguson  (1967)  for  more  details. 

36.   Actually  the  formulae  involving  the  first  and  second  order  derivatives 

of  -InC  appear  as  an  exercise  in  Lehmann  (1959),  p.  58,  problem  14. 

39.   Here  we  are  referring  to  using  the  method  of  moments  for  estimating  D  . 
A  potential  empirical  problem  with  this  approach  is  that  the  sample 
variances  of  high  order  moments  can  be  quite  large.   See  Kendall  and 
Stuart  (1963)  p.  234  for  a  discussion  of  this  problem.   While  D  in- 
corporates only  third  order  moments,  extensions  of  our  methodology 

* 
to  higher  order  derivatives  of  (j)  will  involve  fourth  and  higher  order 

moments,  and  thus  the  sampling  variability  problem  of  the  method  of 

moments  may  be  more  critical. 

40.   This  is  because  the  formulae  (5.7)  is  a  continuous  and  differentiable 
function  of  the  moments  comprising  it.   "Standard  methods"  refer  to 
applications  of  theorems  such  as  Lemma  AP.l. 


-  52  - 


41.  Although  D  of  (5.7)  is  directly  estimable  from  cross  section  moments, 

it  would  be  useful  if  D  could  be  related  to  simpler  statistics,  such  as 

regression  coefficients.   In  this  sense  it  is  easily  shown  that  if  M  =  1, 
performing  the  micro  regression 

x^°  =  C„  +C^  v^  (aJ°)  +S(v,(a^°))' 

gives 

,.   :       "xii  ^11    ^111  "ii 

plim  C  =  — r r — r- 

K-^   ^   a  °  a  °   -  (a  °  )  -  (a  °) 
11  1111   ^  111'    ^  ll' 

which  is  proportional  to  (5.9),  and  thus  provides  an  easily  computable 
test  of  (5.9)  equaling  zero  (although  bear  in  mind  stochastic  structure 
differences,  as  in  fn  16).   The  natural  conjecture  is  that  including 
all  squared  and  cross  product  terms  in  a  micro  regression  produces  co- 
efficients which  consistently  estimate  D  up  to  a  proportion  matrix.   Un- 
fortunately, proving  or  disproving  this  result  is  a  computational  mightmare, 
and  to  date  the  author  has  not  solved  this  problem. 
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